The Third CIPS - SIGHAN Joint Conference on Chinese Language Processing

نویسنده

  • Hailong Cao
چکیده

It is well-known that interpretation of a text requires understanding of its rhetorical relation hierarchy since discourse units rarely exist in isolation. Such discourse structure is fundamental to document-level applications, such as text understanding, summarization, knowledge extraction and question-answering. In comparison with English, there are only a few studies on Chinese discourse analysis, due to the lack of appropriate theories to Chinese discourse structure representation and large-scale well-accepted corpora. In this talk, I will present a novel discourse structure representation scheme for Chinese, called Connectivedriven Dependency Tree (CDT), and describe our adventure in corpus annotation of the Chinese Discourse Treebank (CDTB) of 500 documents, using a top-down strategy to keep consistent with Chinese native’s cognitive habit. BIO: Zhou Guodong received the Ph.D. degree in computer science from the National University of Singapore in 1999. He joined the Institute for Infocomm Research, Singapore, in 1999, and had been an associate scientist, scientist and associate lead scientist at the institute until August 2006. Currently, he is a distinguished professor at the School of Computer Science and Technology, Soochow University, Suzhou, China. His research interests include natural language processing, information extraction and machine learning. Currently, he is an associate editor of ACM Transaction on Asian Language Information Processing(2010.07-2016.06), an editorial member of Journal of Software (Chinese)(2012.012014.12) and a vice chair of Technical Committees on Chinese Information/China Computer Federation(2010.12-2016.12), Computational Linguistics/Chinese Information Processing Society of China and Natural Language Understanding/Artificial Intelligence Society of China. Besides, he had been a member of the Editorial Board of Computational Linguistics (2010.01-2012.12).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CRF-based Experiments for Cross-Domain Chinese Word Segmentation at CIPS-SIGHAN-2010

This paper describes our experiments on the cross-domain Chinese word segmentation task at the first CIPS-SIGHAN Joint Conference on Chinese Language Processing. Our system is based on the Conditional Random Fields (CRFs) model. Considering the particular properties of the out-of-domain data, we propose some novel steps to get some improvements for the special task.

متن کامل

Chinese Personal Name Disambiguation Based on Person Modeling

This document presents the bakeoff results of Chinese personal name in the First CIPS-SIGHAN Joint Conference on Chinese Language Processing. The authors introduce the frame of person disambiguation system LJPD, which uses a new person model. LJPD was built in short time, and it is not given enough training and adjustment. Evaluation on LJPD shows that the precision is competitive, but the reca...

متن کامل

SIR-NERD: A Chinese Named Entity Recognition and Disambiguation System using a Two-Stage Method

This paper presents our SIR-NERD system for the Chinese named entity recognition and disambiguation Task in the CIPS-SIGHAN joint conference on Chinese language processing (CLP2012). Our system uses a two-stage method and some key techniques to deal with the named entity recognition and disambiguation (NERD) task. Experimental results on the test data shows that the proposed system, which incor...

متن کامل

Word Segmentation on Chinese Mirco-Blog Data with a Linear-Time Incremental Model

This paper describes the model we designed for the word segmentation bakeoff on Chinese micro-blog data in the 2nd CIPS-SIGHAN joint conference on Chinese language processing. We presented a linear-time incremental model for word segmentation where rich features including character-based features, word-based features as well as other possible features can be easily employed. We report the perfo...

متن کامل

Personal Attributes Extraction Based on the Combination of Trigger Words, Dictionary and Rules

Personal Attributes Extraction in Unstructured Chinese Text Task is a subtask of The 3rd CIPS-SIGHAN Joint Conference on Chinese Language Processing (CLP-2014). In this report, we propose a method based on the combination of trigger words, dictionary and rules to realize the personal attributes extraction. We introduce the extraction process and show the result of this bakeoff, which can show t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014